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CLAIMS 



[Claim(s)] 

[Claim 1] The visual equipment equipped with the media input section which inputs a 
sound signal and a video signal, a character generation means to change into alphabetic 
information the sound signal received in the aforementioned media input section, and a 
display means to display on a screen the aforementioned alphabetic information generated 
with the aforementioned character generation means. 

[Claim 2] The visual equipment equipped with a speech recognition means to recognize a 
semantic content from the sound signal received in the media input section which inputs a 
sound signal and a video signal, and the aforementioned media input section, a sign 
language picture generation means to change into the animation picture of sign language 
the semantic content recognized with the aforementioned speech recognition means, and a 
display means to display on a screen the image information generated with the 
aforementioned sign language picture generation means. 

[Claim 3] The visual equipment characterized by providing the following. The media input 
section which inputs a sound signal and a video signal. A speech recognition means to 
recognize a semantic content from the sound signal received in the aforementioned media 
input section. A position specification means to pinpoint a speaker's position from the 
aforementioned sound signal or a video signal. A sign language picture generation means to 
change into the animation picture of sign language the semantic content recognized with 
the aforementioned speech recognition means, and a display means to display on a screen 
the image information generated with the aforementioned sign language picture generation 
means corresponding to a speaker's position pinpointed with the aforementioned position 
specification means. 

[Claim 4] The visual equipment characterized by providing the following. The media input 
section which inputs a sound signal and a video signal. A speech recognition means to 
recognize a semantic content from the sound signal received in the aforementioned media 
input section. A speaker identification means to discriminate a speaker from the 
aforementioned sound signal or a video signal. The display means which changes the 
classification of the animation of the aforementioned sign language corresponding to each 
speaker discriminated with the aforementioned speaker identification means in the image 
information generated with a sign language picture generation means to change into the 
animation picture of sign language the semantic content recognized with the 
aforementioned speech recognition means, and the aforementioned sign language picture 
generation means, and is displayed on a screen. 



DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 



[Industrial Application] Especially this invention relates to visual equipments suitable for 

the deaf-mute, such as television and video. 

[0002] 

[Description of the Prior Art] Television which is the example of representation of this 
conventional kind of visual equipment supplies the electric wave acquired by the receiving 
antenna 1 to a tuner 2, as shown in drawing 8 , and after [ amplification ] frequency 
conversion of it is carried out here, and it performs the channel selection according to the 
channel selection remote control 3. After the image circuit 4 amplifies and detects an 
intermediate frequency and distributes a voice intermediate frequency, a chrominance 
subcarrier, and a synchronizing signal, it performs amplification of a luminance signal, 
adds a color-difference signal, and supplies a predetermined video signal to the electrode of 
the picture tube 5. It after [ amplification ]-FM-detects, and the voice circuit 6 restores to a 
voice intermediate frequency, changes it into a sound signal, and is supplied to a 
loudspeaker 7. A synchronization and a deflection circuit 8 separate a synchronizing signal, 
takes the synchronization of a perpendicular and a level transmitter, and has composition 
which creates sawtooth wave current and is supplied to a deflecting coil 9. The media input 
section which inputs a sound signal and a video signal here consists of a receiving antenna 
1, a tuner 2, song selection remote control 3, an image circuit 4, and a voice circuit 6. 
[0003] By the way, as television broadcasting for a deaf-mute, closed caption broadcast 
which displays conversation, an announcement, etc. in a program as a title on a television 
screen is performed in the U.S. In Japan, it is only the program restricted very much and is 
the present condition that the image which attached the title and sign language 
corresponding to the sound signal is broadcast. 

[0004] Moreover, in the latest research, animation picture composition of the sign language 
corresponding to the speech recognition and the semantic information over a speaker 
independence can be considered now (for example, construction of the gesture 

description for ****, Tanahashi truth, Takeji Sakamoto, and Yoshinao Aoki:" sign 
language picture intellectual communication, and a word dictionary, J76-A, 9, 
pp.1332-1341 (1993-9)). 

[0005] Furthermore, some wire communication systems have some which a content 
equivalent [ for deaf-mutes ] to voice service is made [ some ] into alphabetic information, 
and indicate by superimposition on a screen (for example, JP,62-48890,A). 
[0006] 

[Problem(s) to be Solved by the Invention] However, with the above-mentioned 
conventional composition, unless the offer side of image media added separately the 
information which is equivalent to voice beforehand like closed caption broadcast or 
broadcast with a title and sign language, the contents of utterance, such as conversation and 
an announcement, had the technical problem that he could not understand at all to the 
deaf-mute. 

[0007] this invention solves the above-mentioned technical problem, and it aims at offering 
the visual equipment which can understand the content of utterance only by seeing a screen 
for a deaf-mute etc. 
[0008] 

[Means for Solving the Problem] In order to solve the above-mentioned technical problem, 



the visual- equipment of this invention is equipped with the media input section which 
inputs a sound signal and a video signal, a character generation means to change into 
alphabetic information the sound signal received in this media input section, and a display 
means to display on a screen the alphabetic information generated with this character 
generation means. 

[0009] Moreover, it has a speech recognition means to recognize a semantic content from 
the sound signal received in the media input section which inputs a sound signal and a 
video signal, and this media input section, a sign language picture generation means to 
change into the animation picture of sign language the semantic content recognized with 
this speech recognition means, and a display means to display on a screen the image 
information generated with this sign language picture generation means. 
[0010] Or the media input section which inputs a sound signal and a video signal and a 
speech recognition means to recognize a semantic content from the sound signal received in 
this media input section, A position specification means to pinpoint a speaker's position 
from this sound signal or a video signal, It has a sign language picture generation means to 
change into the animation picture of sign language the semantic content recognized with 
this speech recognition means, and a display means to display on a screen the image 
information generated with this sign language picture generation means corresponding to a 
speaker's position pinpointed with the position specification means. 
[001 1] Or the media input section which inputs a sound signal and a video signal and a 
speech recognition means to recognize a semantic content from the sound signal received in 
this media input section, A speaker identification means to discriminate a speaker from this 
sound signal or a video signal, and a sign language picture generation means to change into 
the animation picture of sign language the semantic content recognized with the speech 
recognition means, It has the display means which changes the classification of the 
animation of sign language corresponding to each speaker discriminated with the speaker 
identification means in the image information generated with this sign language picture 
generation means, and is displayed on a screen. 
[0012] 

[Function] As for this invention, the contents of utterance, such as conversation, an 
announcement, etc. in all image media, including television broadcasting are displayed 
immediately on a screen by the above-mentioned composition as an animation picture of a 
character or sign language. 

[0013] It has a position specification means to pinpoint a speaker's position from a sound 
signal or a video signal especially, and the intelligible screen corresponding to the video 
signal of a basis which exists a feeling of presence is built by changing the display position 
of the animation picture of sign language corresponding to this speaker's position. 
[0014] Or the intelligible screen which exists a feeling of presence further is built by having 
a speaker identification means to discriminate a speaker from a sound signal or a video 
signal, and changing the classification of the animation of sign language corresponding to 
each discriminated speaker. 
[0015] 

[Example] The 1st example of this invention is explained with reference to drawing 3 from 
drawing 1 below. It has the same function as what was shown in the conventional example 



in drawing 1 , the same number is given, and explanation is omitted in part. The electric 
wave acquired by the receiving antenna 1 is first supplied to a tuner 2, after 
[ amplification ] frequency conversion of the television which is the example of 
representation of a visual equipment as well as the conventional example is carried out here, 
and it performs the channel selection according to the channel selection remote control 3. 
After the image circuit 4 amplifies and detects an intermediate frequency and distributes a 
voice intermediate frequency, a chrominance subcarrier, and a synchronizing signal, it 
performs amplification of a luminance signal and supplies the video signal which adds a 
color-difference signal and is equivalent to a basic screen to the superimposition display 
means 10. It after [ amplification ]-FM-detects, and the voice circuit 6 restores to a voice 
intermediate frequency, changes it into a sound signal, and is supplied to the sign language 
picture generation means 1 3 through a loudspeaker 7, the character generation means 1 1 , 
and the speech recognition means 12. The media input section which inputs a sound signal 
and a video signal here consists of a receiving antenna 1, a tuner 2, song selection remote 
control 3, an image circuit 4, and a voice circuit 6. 

[0016] The character generation means 1 1 changes a sound signal into alphabetic 
information, and the speech recognition means 12 changes into the animation picture of 
sign language the semantic content recognized with the sign language picture generation 
means 13 after recognizing a semantic content from the sound signal. The change means 14 
is a change means which cannot choose the character-ized picture generated with the 
character generation means 1 1 , or the animation picture of the sign language generated with 
the sign language picture generation means 13, or cannot choose both. The picture chosen 
by the change means 14 supplies a predetermined video signal to the electrode of the 
picture tube 5, after being compounded so that it may superimpose to a part of video signal 
which is equivalent to a basic screen with the superimposition display means 10. 
[0017] The composition of the character generation means 1 1 is explained using drawing 2 . 
In acoustic-analysis section 11a, acoustic analysis of the voice input from the voice circuit 
6 is carried out by the frame period for every unit time, and voice power and an 
autocorrelation function are searched for. Based on these analytical data, by phoneme 
recognition section lib, voice input is divided into a vowel and a consonant and the 
phoneme train for every sound is extracted by taking the phoneme standard pattern and 
matching which are beforehand memorized by phoneme rule section 1 lc. In 1 Id of word 
recognition sections, a word sequence is preceded using the word knowledge beforehand 
memorized by word dictionary lie, and a word is extracted by taking matching with this 
and a phoneme train. In 1 If of syntax analyzers, the adjustment as a text is checked and 
modified to syntax and 1 lg of grammatical rule sections using the linguistic knowledge 
memorized beforehand. Phoneme rule section 11c, word dictionary 1 le, and 1 lg of syntax 
and the grammatical rule sections cooperate mutually, and they form the knowledge 
database here. In 1 lh of sentence mixing kanji, kana and characters character transducers, 
the text extracted by 1 If of syntax analyzers is changed into the Japanese character string of 
kanji kana mixture, and this is imaged and outputted by title picture generation section Hi. 
[001 8] On the other hand, the composition of the speech recognition means 12 and the sign 
language picture generation means 13 is explained using drawing 3 . Like the character 
generation means 1 1, by acoustic-analysis section 12a, acoustic analysis of the voice input 



from the voice circuit 6 is carried out by the frame period for every unit time, and voice 
power and an autocorrelation function are searched for. Based on these analytical data, by 
phoneme recognition section 12b, voice input is divided into a vowel and a consonant and 
the phoneme train for every sound is extracted by taking the phoneme standard pattern and 
matching which are beforehand memorized by phoneme rule section 12c. In 12d of word 
recognition sections, a word sequence is predicted using the word knowledge beforehand 
memorized by word dictionary 12e, and a word is extracted by taking matching with this 
and a phoneme train. In 12f of syntax analyzers, the adjustment as a text is checked and 
modified to syntax and 12g of grammatical rule sections using the linguistic knowledge 
memorized beforehand. Phoneme rule section 12c, word dictionary 12e, and 12g of syntax 
and the grammatical rule sections cooperate mutually, and they form the knowledge 
database here. In 12h of semantic reasoning, the semantic content corresponding to the text 
extracted by 12f of syntax analyzers is recognized on language level, and it transmits to the 
sign language picture generation means 13 as text information with the meaning divided for 
every base unit. 

[0019] With the sign language picture generation means 13, sign language description 
section 13a pulls out and compounds a required sign language operation pattern 
corresponding to this text information from the sign language word beforehand memorized 
by sign language word dictionary 13b. In sign language picture generation section 13c, 
animation imaging is carried out and sign language operation compounded by sign 
language description section 13a is outputted. 

[0020] Since the character generation means 12 changes a sound signal into alphabetic 
information in the above-mentioned composition and the superimposition display means 10 
displays this alphabetic information immediately on a television screen further, even if 
there is no voice output from a loudspeaker 7, it is effective in the ability to understand the 
content only on a television screen. Or it can complain of required information to a visual 
sense also to a tongue twister to which reading becomes difficult only by character 
representation since the speech recognition means 12 and the sign language picture 
generation means 13 change a sound signal into the animation picture of sign language and 
the superimposition display means 10 displays this image information immediately on a 
television screen further, and can tell quickly and briefly. It is effective in being hard to get 
tired, when especially the screen size of the picture tube 5 is small. 
[0021] Next, the 2nd example of this invention is explained with reference to drawing 4 - 
drawing 5 . There is no differing [ of the character generation means 11 or 14 change 
means ] from the 1 st example of this invention in drawing 4 , and it is in having had the 
superimposition display means 16 displayed on a screen corresponding to the position of a 
position specification means 1 5 to newly pinpoint a speaker's position from a video signal 
and a sound signal, and this speaker. From the voice circuit 6, the voice stereo signal shall 
be outputted to right-and-left channel independence here. 

[0022] Since other composition is the same as that of the 1st example, it omits explanation, 
and it explains only the composition of the position specification means 15 using drawing 
5 . Utterance signal extraction section 15a is a filter which extracts only an utterance signal 
from the voice input from the voice circuit 6, detects volume by volume detecting-element 
15b about the sound signal which passed utterance signal extraction section 15a, and 



computes the direction of a speaker in sound field by volume balance and its change on 
either side by direction detecting-element 15c. On the other hand, in 15d of human body 
position detecting elements, labial operation in the human body which detected the human 
body position from the two-dimensional screen by carrying out the image processing of the 
image input from the image circuit 4, and was further detected by labial operation 
detecting-element 15e is detected, and the position on a speaker's screen is computed. In 
15f of position judging sections, based on the input from volume detecting-element 15b, 
direction detecting-element 15c, and labial operation detecting-element 15e, a speaker's 
position in the 3-dimensional space of imagination is presumed, and this positional 
information is told to the superimposition display means 16. 

[0023] The superimposition display means 16 changes on a screen the display position of 
the animation picture of sign language which indicates by superimposition according to a 
speaker's positional information specified with the position specification means 15. In the 
two-dimensional viewing area to superimpose, the size of an animation picture expresses 
depth perception. 

[0024] Corresponding to a speaker's position pinpointed with the position specification 
means 15 in the above-mentioned composition, the display position of the animation 
picture of sign language changes immediately. For example, if the speaker who is in a right 
hand judging from a sound signal begins the talk, the animation of sign language will 
appear in the direction of the screen right, and will begin sign language operation adapted 
to the content of the talk. If it moves to the left while a speaker talks, the animation of sign 
language also moves to the left on the screen. It is the same also about the vertical direction 
or the depth direction. That is, presence can be taken out and a speaker's positional 
information lacked only by replacing the content of voice by sign language can be 
incorporated in a screen. 

[0025] Next, the 3rd example of this invention is explained with reference to drawing 6 - 
drawing 7 . It is to differ from the 2nd example of this invention in drawing 6 to have had 
the display means 18 which there is no position specification means 15 to pinpoint a 
speaker's position, changes the classification of the animation of the aforementioned sign 
language corresponding to a speaker identification means 17 to newly discriminate a 
speaker from a video signal, and this speaker, and is displayed on a screen. The speaker 
identification means 17 is the composition of making the classification of animation 
changing by extracting all men from the video signal inputted from the receiving circuit 4, 
and discriminating a speaker from these men. 

[0026] The composition of the speaker identification means 17 is explained using drawing 
7 . Utterance signal extraction section 17a is a filter which extracts only an utterance signal 
from the voice input from the voice circuit 6, and detects the volume of an utterance signal, 
a pitch, a tone, and a speaker's speech speed about the sound signal which passed utterance 
signal extraction section 17a, respectively by volume detecting-element 17b, pitch 
detecting-element 17c, 17d of tone detecting elements, and speed-detector 17e. On the 
other hand, in 17f of human body configuration detecting elements, a human body 
configuration is detected from a two-dimensional screen by carrying out the image 
processing of the image input from the image circuit 4, and a speaker's characteristic 
quantity further obtained from a picture by 17g of labial operation detecting elements and 



17h of feature-extraction sections is computed. It classifies into either of 20 patterns which 
prepared a speaker's characteristic quantity beforehand based on the output from these 
volume detecting-element 17b, pitch detecting-element 17c, 17d of tone detecting elements, 
speed-detector 17e, 17f of human body configuration detecting elements, and 17g of labial 
operation detecting elements, for example according to speaker judging section 17i, and 
this classified information is told to the superimposition display means 18. The method of a 
classification is realized by drawing an individual feature vector all over multi-dimension 
space using the neural network technique, such as the multivariate-analysis technique, such 
as principal component analysis, and a study vector quantization. The superimposition 
display means 18 is composition which changes immediately the classification of the 
animation of sign language which indicates by superimposition corresponding to each 
speaker discriminated with the speaker identification means 17. 

[0027] Corresponding to each speaker discriminated with the speaker identification means 
17 in the above-mentioned composition, the classification of the animation of sign language 
changes immediately. That is, presence can be taken out and each speaker's feature lacked 
only by replacing the content of voice by sign language can be incorporated in a screen. 
[0028] In addition, although the 1st to 3rd example explained taking the case of television, 
this invention can be adapted for all the visual equipments that output a sound signal and a 
video signal. It is not dependent on the kind of a sound signal or video signal. As means of 
displaying to a screen although the superimposition display meanses 10, 16, and 18 are 
used, it divides into a child screen, and it may be made to display independently or the 
alphabetic information and the animation picture of sign language which were newly 
generated may be made the composition projected on another screen. A screen does not 
need to use the picture tubes 5, such as CRT,, either. Moreover, the display screen may be 
connected with the communication line of a cable or radio, and you may install in the place 
distant from other components. 

[0029] Furthermore, the character generation means 1 1 does not restrict the target language 
to Japanese. The input signal to the position specification means 15 or the speaker 
identification means 17 does not need to combine both a sound signal and a video signal. 
Two kinds of sex etc. does not care about speaker's kind discriminated by the speaker 
identification means 17. Conversely, according to a speaker's characteristic quantity, you 
may adjust the size of animation, a configuration, expression, operation, classification, etc. 
on a stepless story. The characteristic quantity of voice may be changed into sexual desire 
news, such as color and lightness, and change may be given to animation expression of sign 
language. 
[0030] 

[Effect of the Invention] According to the visual equipment of this invention, the following 
effect is acquired as mentioned above. 

[0031] (1) For those who want to know the content of an image without a deaf-mute or 
voice, he can understand the content of utterance only by seeing a screen. Moreover, it can 
use also for high report / relay program of huge image media or urgency accumulated until 
now as it is. 

[0032] (2) When displaying the content of utterance by the animation picture of sign 
language, only by character representation, also to a tongue twister to which reading 



becomes difficult, it can complain of required information to a visual sense, and it can be 
told briefly quickly. For the person who mastered at once, sign language is the volition 
transfer method which is easy to understand, and is effective in being hard to get tired, 
when display screen size is small. 

[0033] (3) The intelligible screen which exists a feeling of presence can be built by making 
it display simultaneously on a screen corresponding to the position of the speaker who had 
the animation picture of sign language specified further. 

[0034] (4) or the thing for which the classification of the animation picture of sign language 
is changed corresponding to each speaker — more — a feeling of presence — a certain 
intelligible screen can be built 
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[Brief Description of the Drawings] 

[Drawing 1] The block diagram of the visual equipment in the 1st example of this invention 
[DrawinR 21 The block diagram of the character generation means in this example 
[Drawing 31 The block diagram of the speech recognition means in this example, and a sign 
language picture generation means 

[Drawing 4] The block diagram of the visual equipment in the 2nd example of this 
invention 

[Drawing 5] The block diagram of the position specification means in this example 
[Drawing 61 The block diagram of the visual equipment in the 3rd example of this 
invention 

[Drawing 71 The block diagram of the speaker identification means in this example 

[Drawing 81 The block diagram of the conventional visual equipment 
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